This article explores how to incorporate images into a RAG (Retrieval-Augmented Generation) knowledgebase using Large Language Models (LLMs) with vision capabilities. It provides a step-by-step guide to collecting, uploading, and transcribing images for a richer and more detailed knowledgebase.